Documents
Documents enable you to expand an agent’s knowledge and its ground responses by adding information from attached files or HTTP/HTTPS URLs.
-
Improved accuracy: By grounding responses in real-time retrieved data, this approach reduces hallucinations and enhances factual reliability.
-
Dynamic adaptability: The model adjusts its responses based on the latest information from the knowledge source.
-
Domain-specific knowledge: Ideal for use cases that demand specialized or frequently updated knowledge, such as customer support, research, or product documentation.
AI Agents framework supports the following methods of working with documents
RAG in every query – implements classic Retrieval-Augmented Generation (RAG) technique, which provides document extracts relevant to user’s query.
RAG via doc_search tool – implements agentic RAG pipeline, where LLM can decide when it needs document extracts.
Full content in prompt – includes complete content of attached documents in the prompt. This mode is suitable for relatively short documents.
Content via doc_get tool – implements agentic pipeline, where LLM can request content from specific documents or their sections.
For more details, see Using documents.
Parsing documents
For RAG (Retrieval-Augmented Generation) processing, documents must be split into smaller, manageable chunks to ensure efficient retrieval and accurate context handling. Each chunk should contain a coherent piece of information, typically a few hundred words, so the model can understand it in isolation. Properly chunked documents improve retrieval relevance and help the LLM generate more precise and consistent responses.
LiveHub AI Agents parses documents using the following pipeline:
Documents are converted to Markdown format.
Markdown files are split into chunks based on their headings, keeping paragraphs intact. The target chunk size is user-specified but may vary depending on the content.
Oversized chunks are further divided into smaller pieces, with overlapping content to preserve context.
Each chunks is enriched with contextual metadata derived from the document’s description and header structure.
In addition to the chunking pipeline, LiveHub AI Agents retain the original document in Markdown format to support “full content” working modes, allowing the agent to access the complete source when needed.